In [132]:
from IPython.core.display import HTML
HTML("""
<style>
.container { width: 100% !important; padding-left: 1em; padding-right: 2em; }
div.output_stderr { background: #FFA; }
</style>
""")


Out[132]:

Information-base Learning

Let's review the basic idea of decision trees.

Shannon's model of entropy is a weighted sum of the logs of the probabilities of each of the possible outcomes when we make a random selection from a set.

$$ H(t) = - \sum_{i=1}^l \left( P(t=i)\times log_s(P(t=i)) \right) $$

The information gain of a descriptive feature can be understood as a measure of the reduction in the overall entropy of a prediction task by testing on that feature.

Computing information gain involves the following 3 equations: $$ H\left(t, \mathcal{D}\right) = - \sum_{l \in levels(t)} \left( P(t=l) \times log_2(P(t=l)) \right) $$

$$ rem\left(d,\mathcal{D}\right) = \sum_{l \in levels\left(d\right)} \underbrace{ \frac{|\mathcal{D}_{d=l}|}{|\mathcal{D}|}}_{\text{weighting}} \times \underbrace{H\left(t, \mathcal{D}_{d=l}\right)}_{ \substack{ \text{entropy of}\\ \text{partition }\mathcal{D}_{d=l} } } $$$$ IG\left(d,\mathcal{D}\right) = H\left(t,\mathcal{D}\right) - rem\left(d, \mathcal{D}\right) $$

Implementing our own ID3

Let's implement a simplistic ID3 algorithm. We assume that each feature can only have two values. Hence we have two branches per tree: the left branch for the lower value or false, the right branch for higher or true.

There is no consideration of efficiency or even avoiding redundant computations. The algorithm also won't return the actual decision tree. We're just going to use print statement to indicate what the algorithm does.

However, this could be build out to something more useful...


In [ ]:
### define function information gain IG(d_feature, D_set)
def IG(d_feature, D_set):
    # compute the information gain by dividing with feature d
    return 0.0

In [ ]:
def splitDataSet(d_feature, D_set):
    Left_set = []
    Right_set = []
    return (Left_set, Right_set)

In [ ]:
def myID3(list_of_features, D_set):
    if len(list_of_features)==0:
        # we're done
    elif len(D_set)<=1:
        # we're done
    else:
        for f in list_of_features:
            # which feature has greatest information gain on D_set?
            (L_set, R_set) = splitDataSet(fmax, D_set)
            remfeatures = list_of_feature.remove(fmax)
            myID3(remfeatures, L_set)
            myID3(remfeatures, R_set)

In [ ]:


In [ ]:


In [ ]:

Alternative Metrics

  • Entropy based information gain, preferences features with many values.
  • One way of addressing this issue is to use information gain ratio which is computed by dividing the information gain of a feature by the amount of information used to determine the value of the feature:
$$ GR\left(d,\mathcal{D}\right) = \frac{IG\left(d,\mathcal{D}\right)}{- \displaystyle \sum_{l \in levels\left(d\right)} \left(P(d=l) \times log_2(P(d=l)) \right)} $$
  • Another commonly used measure of impurity is the Gini index:
$$ Gini\left(t,\mathcal{D}\right) = 1 - \sum_{l \in levels(t)} P(t=l)^2 $$
  • The Gini index can be thought of as calculating how often you would misclassify an instance in the dataset if you classified it based on the distribution of classifications in the dataset.
  • Information gain can be calculated using the Gini index by replacing the entropy measure with the Gini index.

Predicting Continous Targets

  • Regression trees are constructed so as to reduce the \defn{variance} in the set of training examples at each of the leaf nodes in the tree
  • We can do this by adapting the ID3 algorithm to use a measure of variance rather than a measure of classification impurity (entropy) when selecting the best attribute
  • The impurity (variance) at a node can be calculated using the following equation: $$ var\left(t, \mathcal{D}\right) = \frac{\sum_{i=1}^n \left( t_i - \bar{t}\right)^2}{n-1} $$

  • We select the feature to split on at a node by selecting the feature that minimizes the weighted variance across the resulting partitions:

$$ \mathbf{d}[best] = \textrm{argmin}_{d \in \mathbf{d}} \sum_{l \in levels(d)} \frac{|\mathcal{D}_{d=l}|}{|\mathcal{D}|} \times var(t, \mathcal{D}_{d=l}) $$

Decision Trees in SciKit-learn

The scikit-learn package provide an implementation of decision tree classifiers. The following example demonstrates the use of the library on the very simplistic Iris data set.

This is an excerpt from the online documentation:

As with other classifiers, DecisionTreeClassifier takes as input two arrays: an array X, sparse or dense, of size [n_samples, n_features] holding the training samples, and an array Y of integer values, size [n_samples], holding the class labels for the training samples.

Note: In this environment we represent each sample as a row, and features are columns. This is not always the case!


In [133]:
from sklearn import tree
X = [[0, 0], [1, 1]]
Y = [0, 1]
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)

After being fitted, the model can then be used to predict the class of samples:


In [137]:
clf.predict([[1., 0.], [0, 0], [3, 0]])


Out[137]:
array([0, 0, 0])

DecisionTreeClassifier is capable of both binary (where the labels are [-1, 1]) classification and multiclass (where the labels are [0, ..., K-1]) classification.

Using the Iris dataset, we can construct a tree as follows:


In [138]:
from sklearn.datasets import load_iris
from sklearn import tree
iris = load_iris()
clf = tree.DecisionTreeClassifier()
clf = clf.fit(iris.data, iris.target)

In [139]:
iris.data


Out[139]:
array([[ 5.1,  3.5,  1.4,  0.2],
       [ 4.9,  3. ,  1.4,  0.2],
       [ 4.7,  3.2,  1.3,  0.2],
       [ 4.6,  3.1,  1.5,  0.2],
       [ 5. ,  3.6,  1.4,  0.2],
       [ 5.4,  3.9,  1.7,  0.4],
       [ 4.6,  3.4,  1.4,  0.3],
       [ 5. ,  3.4,  1.5,  0.2],
       [ 4.4,  2.9,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5.4,  3.7,  1.5,  0.2],
       [ 4.8,  3.4,  1.6,  0.2],
       [ 4.8,  3. ,  1.4,  0.1],
       [ 4.3,  3. ,  1.1,  0.1],
       [ 5.8,  4. ,  1.2,  0.2],
       [ 5.7,  4.4,  1.5,  0.4],
       [ 5.4,  3.9,  1.3,  0.4],
       [ 5.1,  3.5,  1.4,  0.3],
       [ 5.7,  3.8,  1.7,  0.3],
       [ 5.1,  3.8,  1.5,  0.3],
       [ 5.4,  3.4,  1.7,  0.2],
       [ 5.1,  3.7,  1.5,  0.4],
       [ 4.6,  3.6,  1. ,  0.2],
       [ 5.1,  3.3,  1.7,  0.5],
       [ 4.8,  3.4,  1.9,  0.2],
       [ 5. ,  3. ,  1.6,  0.2],
       [ 5. ,  3.4,  1.6,  0.4],
       [ 5.2,  3.5,  1.5,  0.2],
       [ 5.2,  3.4,  1.4,  0.2],
       [ 4.7,  3.2,  1.6,  0.2],
       [ 4.8,  3.1,  1.6,  0.2],
       [ 5.4,  3.4,  1.5,  0.4],
       [ 5.2,  4.1,  1.5,  0.1],
       [ 5.5,  4.2,  1.4,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 5. ,  3.2,  1.2,  0.2],
       [ 5.5,  3.5,  1.3,  0.2],
       [ 4.9,  3.1,  1.5,  0.1],
       [ 4.4,  3. ,  1.3,  0.2],
       [ 5.1,  3.4,  1.5,  0.2],
       [ 5. ,  3.5,  1.3,  0.3],
       [ 4.5,  2.3,  1.3,  0.3],
       [ 4.4,  3.2,  1.3,  0.2],
       [ 5. ,  3.5,  1.6,  0.6],
       [ 5.1,  3.8,  1.9,  0.4],
       [ 4.8,  3. ,  1.4,  0.3],
       [ 5.1,  3.8,  1.6,  0.2],
       [ 4.6,  3.2,  1.4,  0.2],
       [ 5.3,  3.7,  1.5,  0.2],
       [ 5. ,  3.3,  1.4,  0.2],
       [ 7. ,  3.2,  4.7,  1.4],
       [ 6.4,  3.2,  4.5,  1.5],
       [ 6.9,  3.1,  4.9,  1.5],
       [ 5.5,  2.3,  4. ,  1.3],
       [ 6.5,  2.8,  4.6,  1.5],
       [ 5.7,  2.8,  4.5,  1.3],
       [ 6.3,  3.3,  4.7,  1.6],
       [ 4.9,  2.4,  3.3,  1. ],
       [ 6.6,  2.9,  4.6,  1.3],
       [ 5.2,  2.7,  3.9,  1.4],
       [ 5. ,  2. ,  3.5,  1. ],
       [ 5.9,  3. ,  4.2,  1.5],
       [ 6. ,  2.2,  4. ,  1. ],
       [ 6.1,  2.9,  4.7,  1.4],
       [ 5.6,  2.9,  3.6,  1.3],
       [ 6.7,  3.1,  4.4,  1.4],
       [ 5.6,  3. ,  4.5,  1.5],
       [ 5.8,  2.7,  4.1,  1. ],
       [ 6.2,  2.2,  4.5,  1.5],
       [ 5.6,  2.5,  3.9,  1.1],
       [ 5.9,  3.2,  4.8,  1.8],
       [ 6.1,  2.8,  4. ,  1.3],
       [ 6.3,  2.5,  4.9,  1.5],
       [ 6.1,  2.8,  4.7,  1.2],
       [ 6.4,  2.9,  4.3,  1.3],
       [ 6.6,  3. ,  4.4,  1.4],
       [ 6.8,  2.8,  4.8,  1.4],
       [ 6.7,  3. ,  5. ,  1.7],
       [ 6. ,  2.9,  4.5,  1.5],
       [ 5.7,  2.6,  3.5,  1. ],
       [ 5.5,  2.4,  3.8,  1.1],
       [ 5.5,  2.4,  3.7,  1. ],
       [ 5.8,  2.7,  3.9,  1.2],
       [ 6. ,  2.7,  5.1,  1.6],
       [ 5.4,  3. ,  4.5,  1.5],
       [ 6. ,  3.4,  4.5,  1.6],
       [ 6.7,  3.1,  4.7,  1.5],
       [ 6.3,  2.3,  4.4,  1.3],
       [ 5.6,  3. ,  4.1,  1.3],
       [ 5.5,  2.5,  4. ,  1.3],
       [ 5.5,  2.6,  4.4,  1.2],
       [ 6.1,  3. ,  4.6,  1.4],
       [ 5.8,  2.6,  4. ,  1.2],
       [ 5. ,  2.3,  3.3,  1. ],
       [ 5.6,  2.7,  4.2,  1.3],
       [ 5.7,  3. ,  4.2,  1.2],
       [ 5.7,  2.9,  4.2,  1.3],
       [ 6.2,  2.9,  4.3,  1.3],
       [ 5.1,  2.5,  3. ,  1.1],
       [ 5.7,  2.8,  4.1,  1.3],
       [ 6.3,  3.3,  6. ,  2.5],
       [ 5.8,  2.7,  5.1,  1.9],
       [ 7.1,  3. ,  5.9,  2.1],
       [ 6.3,  2.9,  5.6,  1.8],
       [ 6.5,  3. ,  5.8,  2.2],
       [ 7.6,  3. ,  6.6,  2.1],
       [ 4.9,  2.5,  4.5,  1.7],
       [ 7.3,  2.9,  6.3,  1.8],
       [ 6.7,  2.5,  5.8,  1.8],
       [ 7.2,  3.6,  6.1,  2.5],
       [ 6.5,  3.2,  5.1,  2. ],
       [ 6.4,  2.7,  5.3,  1.9],
       [ 6.8,  3. ,  5.5,  2.1],
       [ 5.7,  2.5,  5. ,  2. ],
       [ 5.8,  2.8,  5.1,  2.4],
       [ 6.4,  3.2,  5.3,  2.3],
       [ 6.5,  3. ,  5.5,  1.8],
       [ 7.7,  3.8,  6.7,  2.2],
       [ 7.7,  2.6,  6.9,  2.3],
       [ 6. ,  2.2,  5. ,  1.5],
       [ 6.9,  3.2,  5.7,  2.3],
       [ 5.6,  2.8,  4.9,  2. ],
       [ 7.7,  2.8,  6.7,  2. ],
       [ 6.3,  2.7,  4.9,  1.8],
       [ 6.7,  3.3,  5.7,  2.1],
       [ 7.2,  3.2,  6. ,  1.8],
       [ 6.2,  2.8,  4.8,  1.8],
       [ 6.1,  3. ,  4.9,  1.8],
       [ 6.4,  2.8,  5.6,  2.1],
       [ 7.2,  3. ,  5.8,  1.6],
       [ 7.4,  2.8,  6.1,  1.9],
       [ 7.9,  3.8,  6.4,  2. ],
       [ 6.4,  2.8,  5.6,  2.2],
       [ 6.3,  2.8,  5.1,  1.5],
       [ 6.1,  2.6,  5.6,  1.4],
       [ 7.7,  3. ,  6.1,  2.3],
       [ 6.3,  3.4,  5.6,  2.4],
       [ 6.4,  3.1,  5.5,  1.8],
       [ 6. ,  3. ,  4.8,  1.8],
       [ 6.9,  3.1,  5.4,  2.1],
       [ 6.7,  3.1,  5.6,  2.4],
       [ 6.9,  3.1,  5.1,  2.3],
       [ 5.8,  2.7,  5.1,  1.9],
       [ 6.8,  3.2,  5.9,  2.3],
       [ 6.7,  3.3,  5.7,  2.5],
       [ 6.7,  3. ,  5.2,  2.3],
       [ 6.3,  2.5,  5. ,  1.9],
       [ 6.5,  3. ,  5.2,  2. ],
       [ 6.2,  3.4,  5.4,  2.3],
       [ 5.9,  3. ,  5.1,  1.8]])

In [140]:
iris.target


Out[140]:
array([0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
       0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
       2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])

Once trained, we can export the tree in Graphviz format using the export_graphviz exporter. Below is an example export of a tree trained on the entire iris dataset.

In order to compute inline images the package pydot is required.


In [141]:
from sklearn.externals.six import StringIO  
import pydot 
from IPython.display import Image  

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data,
    feature_names=iris.feature_names) 
graph = pydot.graph_from_dot_data(dot_data.getvalue()) 
graph.write_pdf("iris.pdf") 
Image(graph.create_png())


Out[141]:

In [142]:
import numpy as np
import matplotlib.pyplot as plt

from sklearn.datasets import load_iris
from sklearn.tree import DecisionTreeClassifier

# Parameters
n_classes = 3
plot_colors = "bry"
plot_step = 0.02

# Load data
iris = load_iris()

for pairidx, pair in enumerate([[0, 1], [0, 2], [0, 3],
                                [1, 2], [1, 3], [2, 3]]):
    # We only take the two corresponding features
    X = iris.data[:, pair]
    y = iris.target

    # Shuffle
    idx = np.arange(X.shape[0])
    np.random.seed(13)
    np.random.shuffle(idx)
    X = X[idx]
    y = y[idx]

    # Standardize
    mean = X.mean(axis=0)
    std = X.std(axis=0)
    X = (X - mean) / std

    # Train
    clf = DecisionTreeClassifier().fit(X, y)

    # Plot the decision boundary
    plt.subplot(2, 3, pairidx + 1)

    x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1
    y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1
    xx, yy = np.meshgrid(np.arange(x_min, x_max, plot_step),
                         np.arange(y_min, y_max, plot_step))

    Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])
    Z = Z.reshape(xx.shape)
    cs = plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)

    plt.xlabel(iris.feature_names[pair[0]])
    plt.ylabel(iris.feature_names[pair[1]])
    plt.axis("tight")

    # Plot the training points
    for i, color in zip(range(n_classes), plot_colors):
        idx = np.where(y == i)
        plt.scatter(X[idx, 0], X[idx, 1], c=color, label=iris.target_names[i],
                    cmap=plt.cm.Paired)

    plt.axis("tight")

plt.suptitle("Decision surface of a decision tree using paired features")
plt.legend()
plt.show()


Regression with Decision Trees in SciKit-learn

A 1D regression with decision tree. The decision trees is used to fit a sine curve with addition noisy observation. As a result, it learns local linear regressions approximating the sine curve.

We can see that if the maximum depth of the tree (controlled by the max_depth parameter) is set too high, the decision trees learn too fine details of the training data and learn from the noise, i.e. they overfit.


In [ ]:
# Import the necessary modules and libraries
import numpy as np
from sklearn.tree import DecisionTreeRegressor
import matplotlib.pyplot as plt
%matplotlib inline

# Create a random dataset
rng = np.random.RandomState(1)
X = np.sort(5 * rng.rand(80, 1), axis=0)
y = np.sin(X).ravel()
y[::5] += 3 * (0.5 - rng.rand(16))

# Fit regression model
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_1.fit(X, y)
regr_2.fit(X, y)

# Predict
X_test = np.arange(0.0, 5.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)

# Plot the results
plt.figure()
plt.scatter(X, y, c="k", label="data")
plt.plot(X_test, y_1, c="g", label="max_depth=2", linewidth=2)
plt.plot(X_test, y_2, c="r", label="max_depth=5", linewidth=2)
plt.xlabel("data")
plt.ylabel("target")
plt.title("Decision Tree Regression")
plt.legend()
plt.show()

In [ ]:
## Another Canned Example

In [ ]:
print(__doc__)

import numpy as np
import matplotlib.pyplot as plt
from sklearn.tree import DecisionTreeRegressor

# Create a random dataset
rng = np.random.RandomState(1)
X = np.sort(200 * rng.rand(100, 1) - 100, axis=0)
y = np.array([np.pi * np.sin(X).ravel(), np.pi * np.cos(X).ravel()]).T
y[::5, :] += (0.5 - rng.rand(20, 2))

# Fit regression model
regr_1 = DecisionTreeRegressor(max_depth=2)
regr_2 = DecisionTreeRegressor(max_depth=5)
regr_3 = DecisionTreeRegressor(max_depth=8)
regr_1.fit(X, y)
regr_2.fit(X, y)
regr_3.fit(X, y)

# Predict
X_test = np.arange(-100.0, 100.0, 0.01)[:, np.newaxis]
y_1 = regr_1.predict(X_test)
y_2 = regr_2.predict(X_test)
y_3 = regr_3.predict(X_test)

# Plot the results
plt.figure()
plt.scatter(y[:, 0], y[:, 1], c="k", label="data")
plt.scatter(y_1[:, 0], y_1[:, 1], c="g", label="max_depth=2")
plt.scatter(y_2[:, 0], y_2[:, 1], c="r", label="max_depth=5")
plt.scatter(y_3[:, 0], y_3[:, 1], c="b", label="max_depth=8")
plt.xlim([-6, 6])
plt.ylim([-6, 6])
plt.xlabel("data")
plt.ylabel("target")
plt.title("Multi-output Decision Tree Regression")
plt.legend()
plt.show()

In [ ]:

Data Files

Here a couple of data files from the UCI Machine Learning Repository

Mushroom is a set with over 8,000 samples of categorical attributes of mushrooms. The classifer may determine whether a mushroom is edible, or not.
https://archive.ics.uci.edu/ml/datasets/Mushroom
https://archive.ics.uci.edu/ml/machine-learning-databases/mushroom/

Car Evaluation is set with about 1,700 samples evaluating cars as "unacceptable", "acceptable", "good", and "very good".
https://archive.ics.uci.edu/ml/datasets/Car+Evaluation
https://archive.ics.uci.edu/ml/machine-learning-databases/car/


In [143]:
dataset = []
for l in open("car.data").readlines():
    dataset.append(l[0:-1].split(','))

dataset[0:9]


Out[143]:
[['vhigh', 'vhigh', '2', '2', 'small', 'low', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'small', 'med', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'small', 'high', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'med', 'low', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'med', 'med', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'med', 'high', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'big', 'low', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'big', 'med', 'unacc'],
 ['vhigh', 'vhigh', '2', '2', 'big', 'high', 'unacc']]

Unfortunately, the scikit-learn algorithm need numerical data. We need to convert the categorical labels into numbers.

First, we determine the levels for each feature. Then we create a dictonary that to convert the labels into numbers.


In [144]:
Nfeature = len(dataset[0])
Nsample = len(dataset)
print("%s x %s" % (Nsample, Nfeature))


1728 x 7

In [ ]:
## http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.LabelEncoder.html
## use LabelEncoder instead of the home grown...

In [145]:
levdict = [{} for c in range(0, Nfeature)]
for c in range(0, Nfeature):
    vec = []
    for r in range(0, Nsample):
        vec.append(dataset[r][c])
    levels = list(set(vec))
    for k in range(0, len(levels)):
        levdict[c][levels[k]] = k

levdict


Out[145]:
[{'high': 2, 'low': 1, 'med': 0, 'vhigh': 3},
 {'high': 2, 'low': 1, 'med': 0, 'vhigh': 3},
 {'2': 3, '3': 2, '4': 1, '5more': 0},
 {'2': 2, '4': 0, 'more': 1},
 {'big': 1, 'med': 0, 'small': 2},
 {'high': 2, 'low': 0, 'med': 1},
 {'acc': 2, 'good': 1, 'unacc': 3, 'vgood': 0}]

In [146]:
numdata = [ [ levdict[c][dataset[r][c]] for c in range(Nfeature)] for r in range(Nsample) ]
numdata[0:9]


Out[146]:
[[3, 3, 3, 2, 2, 0, 3],
 [3, 3, 3, 2, 2, 1, 3],
 [3, 3, 3, 2, 2, 2, 3],
 [3, 3, 3, 2, 0, 0, 3],
 [3, 3, 3, 2, 0, 1, 3],
 [3, 3, 3, 2, 0, 2, 3],
 [3, 3, 3, 2, 1, 0, 3],
 [3, 3, 3, 2, 1, 1, 3],
 [3, 3, 3, 2, 1, 2, 3]]

In [147]:
X = [] # attributes
Y = [] # target
for sample in numdata:
    X.append(sample[0:5])
    Y.append(sample[6])
    
[ X[0:9], Y[0:9] ]


Out[147]:
[[[3, 3, 3, 2, 2],
  [3, 3, 3, 2, 2],
  [3, 3, 3, 2, 2],
  [3, 3, 3, 2, 0],
  [3, 3, 3, 2, 0],
  [3, 3, 3, 2, 0],
  [3, 3, 3, 2, 1],
  [3, 3, 3, 2, 1],
  [3, 3, 3, 2, 1]],
 [3, 3, 3, 3, 3, 3, 3, 3, 3]]

In [148]:
clf = tree.DecisionTreeClassifier()
clf = clf.fit(X, Y)    
clf


Out[148]:
DecisionTreeClassifier(class_weight=None, criterion='gini', max_depth=None,
            max_features=None, max_leaf_nodes=None, min_samples_leaf=1,
            min_samples_split=2, min_weight_fraction_leaf=0.0,
            random_state=None, splitter='best')

In [149]:
car_feature_names = ["bying", "maint", "doors", "person", "lug_boot", "safety"]

dot_data = StringIO()
tree.export_graphviz(clf, out_file=dot_data,
    feature_names=car_feature_names) 
graph = pydot.graph_from_dot_data(dot_data.getvalue()) 
graph.write_pdf("cars.pdf") 
###Image(graph.create_png())


Out[149]:
True

Verify what we have

Let's find out how well the classifier predicts it's own training data. We use the method predict.


In [150]:
Ncategory = len(levdict[6])
CM = [ [0 for c in range(Ncategory)] for r in range(Ncategory) ]
CM


Out[150]:
[[0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0], [0, 0, 0, 0]]

In [151]:
Yhat = clf.predict(X)
CM = [ [0 for c in range(Ncategory)] for r in range(Ncategory) ]
for k in range(Nsample):
    CM[Yhat[k]][Y[k]] +=1

CM


Out[151]:
[[65, 39, 26, 65], [0, 30, 30, 30], [0, 0, 248, 124], [0, 0, 80, 991]]

In [152]:
for c in CM:
    print(c)


[65, 39, 26, 65]
[0, 30, 30, 30]
[0, 0, 248, 124]
[0, 0, 80, 991]

In [ ]: